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Multivariate time series present many challenges, especially when 
they are high dimensional. The paper’s focus is twofold. First, we 
address the subject of consistently estimating the autocovariance se¬ 
quence; this is a sequence of matrices that we conveniently stack 
into one huge matrix. We are then able to show consistency of an 
estimator based on the so-called flat-top tapers', most importantly, 
the consistency holds true even when the time series dimension is 
allowed to increase with the sample size. Second, we revisit the lin¬ 
ear process bootstrap (LPB) procedure proposed by McMurry and 
Politis [J. Time Series Anal. 31 (2010) 471-482] for univariate time 
series. Based on the aforementioned stacked autocovariance matrix 
estimator, we are able to dehne a version of the LPB that is valid for 
multivariate time series. Under rather general assumptions, we show 
that our multivariate linear process bootstrap (MLPB) has asymp¬ 
totic validity for the sample mean in two important cases: (a) when 
the time series dimension is hxed and (b) when it is allowed to in¬ 
crease with sample size. As an aside, in case (a) we show that the 
MLPB works also for spectral density estimators which is a novel re¬ 
sult even in the univariate case. We conclude with a simulation study 
that demonstrates the superiority of the MLPB in some important 
cases. 

1. Introduction. Resampling methods for dependent data such as time 
series have been studied extensively over the last decades. For an overview of 
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existing bootstrap methods see the monograph of Lahiri (2003) and the re¬ 
view papers by Biihlmann (2002), Paparoditis (2002), Hardle, Horowitz and 
Kreiss (2003), Politis (2003a) or the recent review paper by Kreiss and Pa¬ 
paroditis (2011). Among the most popular bootstrap procedures in time se¬ 
ries analysis, we mention the autoregressive (AR) sieve bootstrap [cf. Kreiss 
(1992, 1999), Biihlmann (1997), Kreiss, Paparoditis and Politis (2011)] and 
block bootstrap and its variations; cf. Kiinsch (1989), Liu and Singh (1992), 
Politis and Romano (1992, 1994), etc. A recent addition to the available 
time series bootstrap methods was the linear process bootstrap (LPB) in¬ 
troduced by McMurry and Politis (2010) who showed its validity for the 
sample mean for univariate stationary processes without actually assuming 
linearity of the underlying process. 

The main idea of the LPB is to consider the time series data of length n 
as one large n-dimensional vector and to estimate appropriately the entire 
covariance structure of this vector. This is executed by using tapered covari¬ 
ance matrix estimators based on flat-top kernels that were defined in Politis 
(2001). The resulting covariance matrix is used to whiten the data by pre¬ 
multiplying the original (centered) data with its inverse Cholesky matrix; 
a modification of the eigenvalues, if necessary, ensures positive definiteness. 
This decorrelation property is illustrated in Figures 5 and 6 in Jentsch and 
Politis (2013). After suitable centering and standardizing, the whitened vec¬ 
tor is treated as having independent and identically distributed (i.i.d.) com¬ 
ponents with zero mean and unit variance. Finally, i.i.d. resampling from 
this vector and pre-multiplying the corresponding bootstrap vector of resid¬ 
uals with the Cholesky matrix itself results in a bootstrap sample that has 
(approximately) the same covariance structure as the original time series. 

Due to the use of flat-top kernels with compact support, an abruptly 
dying-out autocovariance structure is induced to the bootstrap residuals. 
Therefore, the LPB is particularly suitable for—but not limited to—time 
series of moving average (MA) type. In a sense, the LPB could be consid¬ 
ered the closest analog to an MA-sieve bootstrap which is not practically 
feasible due to nonlinearities in the estimation of the MA parameters. A 
further similarity of the LPB to MA fitting, at least in the univariate case, 
is the equivalence of computing the Cholesky decomposition of the covari¬ 
ance matrix to the innovations algorithm; cf. Rissanen and Barbosa (1969), 
Brockwell and Davis (1988) and Mitchell and Brockwell (1997), the latter 
addressing the multivariate case. 

Typically, bootstrap methods extend easily from the univariate to the 
multivariate case, and the same is true for time series bootstrap procedures 
such as the aforementioned AR-sieve bootstrap and the block bootstrap. By 
contrast, it has not been clear to date if/how the LPB could be successfully 
applied in the context of multivariate time series data; a proposal to that 
effect was described in Jentsch and Politis (2013) —who refer to an earlier 
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preprint of the paper at hand—but it has been unclear to date whether the 
multivariate LPB is asymptotically consistent and/or if it competes well with 
other methods. Here we attempt to fill this gap; we show how to implement 
the LPB in a multivariate context and prove its validity for the sample 
mean and for spectral density estimators, the latter being a new result even 
in the univariate case. Note that the limiting distributions of the sample 
mean and of kernel spectral density estimators depend only on the second- 
order moment structure. Hence it is intuitive that the LPB would be well 
suited for such statistics since it generates a linear process in the bootstrap 
world that mimics well the second-order moment structure of the real world. 
Furthermore, in the spirit of the times, we consider the possibility that the 
time series dimension is increasing with sample size and identify conditions 
under which the multivariate linear process bootstrap (MLPB) maintains its 
asymptotic validity, even in this case. The key here is to address the subject 
of consistently estimating the autocovariance sequence; this is a sequence of 
matrices that we conveniently stack into one huge matrix. We are then able 
to show consistency of an estimator based on the aforementioned flat-top 
tapers; most importantly, the consistency holds true even when the time 
series dimension is allowed to increase with the sample size. 

The paper is organized as follows. In Section 2, we introduce the notation 
of this paper, discuss tapered covariance matrix estimation for multivariate 
stationary time series and state assumptions used throughout the paper; we 
then present our results on convergence with respect to operator norm of 
tapered covariance matrix estimators. The MLPB bootstrap algorithm and 
some remarks can be found in Section 3, and results concerned with validity 
of the MLPB for the sample mean and kernel spectral density estimates 
are summarized in Section 4. Asymptotic results established for the case 
of increasing time series dimension are stated in Section 5, where operator 
norm consistency of tapered covariance matrix estimates and a validity re¬ 
sult for the sample mean are discussed. A finite-sample simulation study is 
presented in Section 6. Finally, all proofs, some additional simulations and 
a real data example on the weighted mean of an increasing number of stock 
prices taken from the German stock index DAX can be found at the paper’s 
supplementary material [Jentsch and Politis (2015)], which is also available 
at http://www.math.ucsd.edu/~politis/PAPER/]yiLPBsupplement.pdf. 

2. Preliminaries. Suppose we consider an M'^-valued time series process 
{ Xf ,t G Z} with = (Ai^t,... and we have data A^,..., A„ at 

hand. The process {Aj,t G Z} is assumed to be strictly stationary and its 
{d X d) autocovariance matrix C{h) = at lag /i G Z is 

C(L) = E((A,+,-^)(A,-^)'^), 


(2.1) 
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where u = E(Xf), and the sample autocovariance C{h) = 
at lag \h\ <n is defined by 


m.m{n,n—h) 

(2.2) = ^ E iXt+h-X){Xt-Xf, 

t=max(l,l—/i) 


where X = ~ 'Xt=i —t d-variate sample mean vector. Here and through¬ 

out the paper, all matrix-valued quantities are written as bold letters, all 
vector-valued quantities are underlined, indicates the transpose of a ma¬ 
trix A, A the complex conjugate of A and = A denotes the transposed 
conjugate of A. Note that it is also possible to use unbiased sample autoco¬ 
variances, that is, having n — \h\ instead of n in the denominator of (2.2). 
Usually the biased version as dehned in (2.2) is preferred because it guar¬ 
antees a positive semi-dehnite estimated autocovariance function, but our 
tapered covariance matrix estimator discussed in Section 2.2 is adjusted in 
order to become positive dehnite in any case. 

Now, let 2L = vec(X) = (Xi,..., X^n)'^ be the hn-dimensional vectorized 
version of the {d x n) data matrix X = [X^ 1 X 2 : • • • :X„], and denote the 
covariance matrix of X, which is symmetric block Toeplitz, by that is. 


(2.3) 


\i,j = l,...,nJ \ij = l,...,dnJ ' 


where Tdnihj) = Cov(Xj,Xj) is the covariance between the fth and jth en- 
try of X. Note that the second order stationarity of {X^,t G Z} does not 
imply second-order stationary behavior of the vectorized dn-dimensional 
data sequence X. This means that the covariances Tdnihj) truly depend 
on both i and j and not only on the difference i — j. However, the follow¬ 
ing one-to-one correspondence between {Cij{h),h G Z,z,j = l,...,d} and 
{^dn{hj)^hj G Z} holds true. Precisely, we have 


ran{i,j) = CoviXi,Xj) 

(2.4) Cov(X^j^(j) „j2(j)) 

(i.i) (^2 (b J )) ; 

where = ("ii(^) 5 "ii(j)) and 1712 ( 1 , j) = uT, 2 (f) —'ni 2 (j) with mi(k) = 

(k — l)modh-|- 1 and m 2 (k) = \k/d\, and \x\ denotes the smallest integer 
greater or equal to a; G M. 

If one is interested in estimating the quantity T^n, h seems natural to 
plug in the sample covariances C(i — j) and Tdnihj) = C'mH*h)(^ 2 (h j)) in 
Tdn and to use 

c(i-j) A / rdn{i,j) 

i,j = l,...,n) \i,j = l,...,dn 
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But unfortunately this estimator is not a consistent estimator for in 
the sense that the operator norm of does not converge to zero. 

This was shown by Wu and Pourahmadi (2009), and to dissolve this prob¬ 
lem in the univariate case, they proposed a banded estimator of the sample 
covariance matrix to achieve consistency. This has been generalized by Mc- 
Murry and Politis (2010), who considered general flat-top kernels as weight 
functions. 

In Section 2.2, we follow the paper of McMnrry and Politis (2010) and pro¬ 
pose a tapered estimator of T^n and show its consistency in Theorem 2.1 for 
the case of mnltivariate processes. Moreover, we state a modified estimator 
that is guaranteed to be positive definite for any finite sample size and show 
its consistency in Theorem 2.2 and of related quantities in Corollary 2.1. 
But prior to this, we state the assumptions that are used throughout this 
paper in the following. 


2.1. Assumptions. 

(Al) G Z} is an M'^-valued strictly stationary time series process 

with mean E(Xf) = p, and antocovariances C(h) defined in (2.1) such that 
Yl’h=-oo l^l^|C(/i)|i < oo for some s' > 0 to be further specified. Let | A|p = 
\ for some matrix A = (ojj). 

(A2) There exists a constant M < oo such that for all n G N, all h with 
\h\ <n and all i,j = l,...,d,we have 


Y,iXi,t+h - -Xj)- nCijih) 

t=i 


< My/n, 

2 


where ||A||p = (FidA|p))^/P. 

(A3) There exists an no G N large enough such that for all n > no the 
eigenvalues Ai,..., X^n of the (dn x dn) covariance matrix Fare bounded 
uniformly away from zero. 

(A4) Define the projection operator Pi,(X) = E{X_\iFk) — E{2QEk-i) for 
Xk = t(Xj, t<k), and snppose that for alH = 1,..., d, we have Ylm=o ll-^o x 
Xi^mWq < OO and \\Xi — //dig = respectively, for some g > 2 to be 

further specified. 

(A5) For the sample mean, a CLT holds true. That is, we have 


^(Z-/r) AaA(0,V), 


T) 

where —> denotes weak convergence, AA(0, V) is a normal distribution with 
zero mean vector and covariance matrix V = Yl'^=-oo^W with V positive 
definite. 
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(A6) For kernel spectral density estimates fpq{oj) as defined in (4.2) in 
Section 4, a CLT holds true. That is, for arbitrary frequencies 0 < wi,..., < 

vr, we have that 

converges to an sd^-dimensional normal distribution for 6 —>• 0 and n6 —)• oo 
such that nb^ = 0(1) as n ^ oo, where the limiting covariance matrix is 
obtained from 

nbCov{fpq{uj)Jrs{X)) 

~ifpri^)fqs{^)du)^\~\~ fpsi^)fqri^)TO,TT)'^^ J (u) du + o(l) 
and the limiting bias from 

E{fpq{u})) - fpqiuj) = b^f”q{uj)^ J K{u)u^ du + o(6^) 

for all p, g, r, s = 1,..., d, where = 1 if w = A and ro, 7 r = 1 if w = A G 
{0,7r} and zero otherwise, respectively. Therefore, f(a;) is assumed to be 
component-wise twice differentiable with Lipschitz-continuous second deriva¬ 
tives. 

Assumption (Al) is quite standard, and the uniform convergence of sam¬ 
ple autocovariances in (A2) is satisfied under different types of conditions 
(cf. Remark 2.1 below) and appears to be a crucial condition here. The uni¬ 
form boundedness of all eigenvalues away from zero in (A3) is implied by a 
nonsingular spectral density matrix f of (X^.t G Z). This follows with (2.3) 
and the inversion formula from 

for all c G where ... ,6“®”"'^) (8i Id and (8) denotes the Kro- 

necker product. The requirement of condition (A3) fits into the theory for 
the univariate autoregressive sieve bootstrap as obtained in Kreiss, Papar- 
oditis and Politis (2011). Similarly, a nonsingular spectral density matrix 
f implies positive definiteness of the long-run variance V = 27rf(0) defined 
in (A5). Assumption (A4) is, for instance, fulfilled if the underlying pro¬ 
cess is linear or a-mixing with summable mixing coefficients by Ibragimov’s 
inequality; cf., for example, Davidson (1994), Theorem 14.2. To achieve va¬ 
lidity of the MLPB for the sample mean and for kernel spectral density 
estimates in Section 4, we have to assume unconditional CLTs in (A5) and 
(A6), which are satisfied also under certain mixing conditions [cf. Doukhan 
(1994), Brillinger (1981)], linearity [cf. Brockwell and Davis (1991), Han¬ 
nan (1970)1 or weak dependence [cf. Dedecker et al. (2007)]. Note also that 


J^f(a;)J^da; c > 27r|c|2 inf Amm(f(w)) 
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the condition nb^ = 0(1) includes the optimal bandwidth choice nb^ —)• O^, 
O > 0 for second-order kernels, which leads to a nonvanishing bias in the 
limiting normal distribution. 

Remark 2.1. Assumption (A2) is implied by different types of condi¬ 
tions imposed on the underlying process C Z}. We present sufficient 

conditions for (A2) under linearity, mixing or weak dependence type condi¬ 
tions. More precisely, (A2) is satisfied if the process G Z} fulfills one 

of the following conditions: 

(i) Linearity. Suppose the process is linear; that is, Aj = YlV=-oo 

t G Z, where {ej,t G Z} is an i.i.d. white noise with finite fourth moments 
E{ei^tej,tek,tei,t) < oo for all i,j,k,l = l,...,d and the sequence of {d x d) 
coefficient matrices {Bfc,/c G Z} is component-wise absolutely summable. 

(ii) Mixing-type condition. Let 

cumai^...,ak (ui,..., Uk-i) = cum(Aaj Xa„0) 

denote the A:th order joint cumulant of ..., Aaj,^o [cf- 

Brillinger (1981)], and suppose E h, s,h)\ < oo for all 

i,j = l,...,d. Note that this is satisfied if {A^, t G Z} is a-mixing such that 
£^(|Ai| 2 ^'^) < oo and < oo for some (i > 0; cf. Shao (2010), 

page 221. 

(hi) Weak dependence-type condition. Suppose for all i,j = l,...,(i, we 
have 

ICov((Aj i /^j)) i^i,t-es-eh w')(^^j,t-\-s /^j))l — E • Vs,h^ 

where C <oo and {ns,h,s,h G Z) is absolutely summable, that is, 

oo 

|z/s,/i|<oo; 

s,h=—oo 

cf. Dedecker et al. (2007). 

2.2. Tapered covariance matrix estimation of multiple time series data. 
To adopt the technique of McMurry and Politis (2010), let 

( 1, \x\ < 1, 

(2.5) k{x) = < 0, |x| > Ck, 

[^dxj), otherwise 

be a so-called flat-top taper [cf. Politis (2001)], where \g{x)\ < 1 and > 1. 
The /-scaled version of k{-) is defined by nflx) = k(j) for some / > 0. As 
Politis (2011) argues, it is advantageous to have a smooth taper k, so the 
truncated kernel that corresponds to g{x) = 0 for all x is not recommended. 
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The simplest example of a continuous taper function k with > 1 is the 
trapezoid 


( 2 . 6 ) 


k{x) 


1, |x| < 1, 

2-|x|, l<|x|<2, 

0, |x| > 2 


which is used in Section 6 for the simulation study; the trapezoidal taper was 
first proposed by Politis and Romano (1995) in a spectral estimation setup. 
Observe also that the banding parameter does not need to be an integer. 
The tapered estimator of F^^^ is given by 


(2.7) 


f Ki{i - j)C{i - j)\ ^ \ 

V = J \ij = l,...,dnj ’ 


where j)) and C^j{h) = Ki{h)Cij{h). 

The following Theorem 2.1 deals with consistency of the tapered estimator 
Fk,/ with respect to operator norm convergence. It extends Theorem 1 in 
McMurry and Politis (2010) to the multivariate case and does not rely on 
the concept of physical dependence only. The operator norm of a complex¬ 
valued {d X d) matrix A is defined by 


/9(A) = max |Ax| 2 , 
xGC'^ : |2;|2 = 1 

and it is well known that / 0 ^(A) = Amax(A'^A) = Amax(AA'^), where A max (B) 
denotes the largest eigenvalue of a matrix B; cf. Horn and Johnson (1990), 
page 296. 


Theorem 2.1. Suppose that assumptions (Al) with g = 0 and (A2) are 
satisfied. Then it holds 


( 2 . 8 ) 


l|p(r're,/ Fc /^)||2 


< 


4Md\[cJ\ + l) 


n 


\XkI\ |, I n—1 

^ ^ ^ ^ |C(h)|;^. 

h=0 h=l-\-l 


The second term on the right-hand side of (2.8) can be represented as 
^-^^|C(/i)|i and vanishes asymptotically due to the Kronecker 
lemma and is of order o{l/n). The third one converges to zero for I = fin) —^ 
oo as n —)■ oo and the leading first term for I = o{^/n). Hence, for 1/1 + 
l/^/n = o{l), the right-hand side of (2.8) vanishes asymptotically. However, 
if |C(h)|i = 0 for \h\ > ho for some ho € N, setting I = ho fixed suffices. In 
this case, the expression on the right-hand side of (2.8) is of faster order 
0(n“^/^). 
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As already pointed out by McMurry and Politis (2010), the tapered esti¬ 
mator Pk,; is not guaranteed to be positive semi-definite or even to be posi¬ 
tive definite for finite sample sizes. However, ^ is at least “asymptotically 
positive definite” under assumption (A3) and due to (2.8) if Ijl = 

o(l) holds. In the following, we require a consistent estimator for which is 
positive dehnite for all hnite sample sizes to be able to compute its Cholesky 
decomposition for the linear process bootstrap scheme that will be intro¬ 
duced in Section 3 below. 

To obtain an estimator of Vdn related to P^,; that is assured to be pos¬ 
itive dehnite for all sample sizes, we construct a modihed estimator P^; 
in the following. Let V = diag(Prf„) be the diagonal matrix of sample vari¬ 
ances, and dehne Now we consider the spectral fac¬ 

torization R^ ^ = SDS^, where S is an (dn x dn) orthogonal matrix and 
D = diag(ri,...,r^n) is the diagonal matrix containing the eigenvalues of 
R^,; such that ri > r 2 > • • • > It is worth noting that this factoriza¬ 
tion always exists due to symmetry of R^,/, but that the eigenvalues can be 
positive, zero or even negative. Now, dehne 

(2.9) f ^ ^ 

where = diag(r^,... jT^^) and r| = max(rj,en“l^). Here, jd > 1/2 and e > 
0 are user dehned constants that ensure the positive dehniteness of P^ z ■ 
Contrary to the univariate case discussed in McMurry and Politis (2010), 
we propose to adjust the eigenvalues of the (equivariant) correlation matrix 
'R-k.i instead of Pk,u which then comes along without a scaling factor in the 
dehnition of rf. Further, note that setting e = 0 leads to a positive semi- 
dehnite estimate if P^,; is indehnite, which does not sufhce for computing 
the Cholesky decomposition, and also that P^; generally loses the banded 
shape of P^^z. Theorem 2.2 below, which extends Theorem 3 in McMurry 
and Politis (2010), shows that the modification of the eigenvalues does affect 
the convergence results obtained in Theorem 2.1 just slightly. 


Theorem 2.2. Under the assumptions of Theorem 2.1, it holds 


( 2 . 10 ) 


(X K,i r ' c (,^)||2 

< ^+ ^) + 4 V ^IC(h)I, + 4 V \Cih)\, 

Jn n 

'' h=0 h=l+l 


+ emaxai(0)n ^ + 
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In comparison to the upper bound established in Theorem 2.1, two more 
terms appear on the right-hand side of ( 2 . 10 ) which do converge as well 
to zero as n tends to infinity. Note that the first three summands that the 
right-hand sides of ( 2 . 8 ) and ( 2 . 10 ) have in common, remain the leading 
terms if /3 > ^. 

We also need convergence and boundedness in operator norm of quan¬ 
tities related to The required results are summarized in the following 
corollary. 


Corollary 2.1. 
we have: 

(i) K,l ~ Tdri) 
where 

( 2 . 11 ) 


Under assumptions (Al) with g = 0, (A 2 ) and (A3), 
and — T^^) are terms of order Op{ri^n), 

j OO 

rpn = -^+ |C(Mll> 

h=l+l 


andrpn = o{l) if \/l ^1/^fn = oifU). 

(ii) and are of order Op{\og^{n) x 

ri^n) log^(n)r;^„ = o(l) if 1/1+ \o^{n)l/^/n = o{l), and (Al) holds for 
some g > 0. 

(hi) p{Tdn), p(^dn)’ p(^dn )’ P(^dn ) are bounded from above and be¬ 
low. p(f^,z), p((fl,i)"^) and p((f^,)-V2)^ p((f^;)V2) are bounded from 
above and below (in probability) if ri^n = o{l) and log^(n)n^„ = o(l), respec¬ 
tively. 


Remark 2.2. In Section 2.2, we propose to use a global banding pa¬ 
rameter I that down-weights the autocovariance matrices for increasing lag; 
that is, the entire matrix C(/i) is multiplied with the same K.i{h) in (2.7). 
However, it is possible to use individual banding parameters Ipq for each se¬ 
quence of entries {Cpq{h),h G Z}, p, <7 = 1,..., d as proposed in Politis (2011), 
compare also the simulation section. 

2.3. Selection of tuning parameters. To get a tapered estimate T^,; of 
the covariance matrix T^^, some parameters have to be chosen by the prac¬ 
titioner. These are the flat-top taper k and the banding parameter I, which 
are both responsible for the down-weighting of the empirical autocovariances 
C(/i) with increasing lag h. 

To select a suitable taper k from the class of functions (2.5), we have to 
select Ck > 1 and the function g which determine the range of the decay of 
K to zero for |x| > 1 and its form over this range, respectively. For some 
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examples of flat-top tapers, compare Politis (2003b, 2011). However, the se¬ 
lection of the banding parameter I appears to be more crucial than choosing 
the tapering function k among the family of well-behaved flat-top kernels 
as discussed in Politis (2011). This is comparable to nonparametric kernel 
estimation where usually the bandwidth plays a more important role than 
the shape of the kernel. 

We focus on providing an empirical rule for banding parameter selection 
that has already been used in McMurry and Politis (2010) for the univariate 
LPB. They make use of an approach primarily proposed in Politis (2003b) 
to estimate the bandwidth in spectral density estimation which has been 
generalized to the multivariate case in Politis (2011). In the following, we 
adopt this technique based on the correlogram/cross-correlogram [cf. Politis 
(2011, Section 6)] for our purposes. Let 



( 2 . 12 ) 


be the sample (cross-)correlation between the two univariate time series 
C 2) and C ^) at lag /i G Z. Now, define qjk as the smallest 

nonnegative integer such that 


\Rjk{qjk + h)\< MoA/logio(n)/n 


for h = 1,..., Kn, where Mq > 0 is a fixed constant, and Kn is a positive, 
nondecreasing integer-valued function of n such that Kn = o(log(re)). Note 
that the constant Mq and the form of Kn are the practitioner’s choice. As a 
rule of thumb, we refer to Politis (2003b, 2011) who makes the concrete rec¬ 
ommendation Mq ~ 2 and = max(5, Y^log]^Q(n)). After having computed 
Qjk for all j. A: = 1,..., d, we take 


(2.13) 


max Qjk 
j,k=l,...,d 


as a data-driven global choice of the banding parameter 1. By setting = 
Qjk, we get data-driven individual banding parameter choices as discussed 
in Remark 2.2. For theoretical justification of this empirical selection of a 
global cut-off point as the maximum over individual choices and assumptions 
that lead to successful adaptation, we refer to Theorem 6.1 in Politis (2011). 

Note also that for positive definite covariance matrix estimation, that 
is, for computing T^ ;, one has to select two more parameters e and /?, 


which have to be nonnegative and might be set equal to one as suggested in 
McMurry and Politis (2010). 
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3. The multivariate linear process bootstrap procedure. In this section, 
we describe the multivariate linear process bootstrap (MLPB) in detail, dis¬ 
cuss some modifications and comment on the special case where the tapered 
covariance estimator becomes diagonal. 

Step 1. Let X be the {d x n) data matrix consisting of valued time 
series data 2Li, ■ ■ ■ :2Ln of sample size n. Compute the centered observa¬ 
tions Y_j- = 2Lt ~ where X = let Y be the corresponding 

{d X n) matrix of centered observations, and define Y = vec(Y) to be the 
dn-dimensional vectorized version of Y. 

Step 2. Compute IT = where denotes the lower left 

triangular matrix L of the Cholesky decomposition F^ i = LL^. 

Step 3. Let _Z be the standardized version of IF, that is, Z* = ^ 

i = l,...,dn, where W = ^ J2t=i and aw = ^ - W)'^. 

Step 4. Generate Z* = (Zj^,..., Z^^)^ by i.i.d. resampling from {Zi,..., 
Zrfn}- 

Step 5. Compute Y_* = (F^ ;)^/^Z*, and let Y* be the matrix that is ob¬ 
tained from Y_* by putting this vector column-wise into an (d x n) matrix, 
and denote its columns by Y^,..., YJ^. 

Regarding steps 3 and 4 above and due to the multivariate nature of the 
data, it appears to be even more natural to split the dn-dimensional vector 
Z in step 3 above in n sub-vectors, to center and standardize them and to 
apply i.i.d. resampling to these vectors to get Z*. More precisely, steps 3 
and 4 can be replaced by: 

Step 3'. Let Z = (Z^,..., Z^)^ be the standardized version of that 
is, Z, = - W), Wher^ = {Wj ,... , W=^ E?=iKt and 

= i - mm - wf- 

Step 4'. Generate Z* = (Z^,..., Z* by i.i.d. resampling from {Z^,..., 

This might preserve more higher order features of the data that are not 
captured by F^;. However, comparative simulations (not reported in the 
paper) indicate that the finite sample performance is only slightly affected 
by this sub-vector resampling. 

Remark 3.1. If 0 < / < —, the banded covariance matrix estimator 
Fk,/ (and F^^^ as well) becomes diagonal. In this case and if steps 3' and 4' 
are used, the LPB as described above is equivalent to the classical i.i.d. 
bootstrap. Here, note the similarity to the autoregressive sieve bootstrap 
which boils down to an i.i.d. bootstrap if the autoregressive order is p = 0. 
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4. Bootstrap consistency for fixed time series dimension. 

4.1. Sample mean. In this section, we establish validity of the MLPB for 
the sample mean. The following theorem generalizes Theorem 5 of McMurry 
and Politis (2010) to the multivariate case under somewhat more general 
conditions. 

Theorem 4.1. Under assumptions (Al) for some 5 > 0, (A2), (A3), 
(A4) forq = 4:, (A5) and 1/1+ log^{n)l/y/n = o{l), the MLPB is asymptot¬ 
ically valid for the sample mean A, that is, 

sup |P{v^(A — /.t) < x} — P*{y/nY* < x}| = op(l) 

and \aT*{y/nY_*) = ^//L-oo^ih) + op{l), where Y_* = ^ short¬ 

hand x<y for X, y G is used to denote Xi < yt for all i = 1,... ,d. 

4.2. Kernel spectral density estimates. Here we prove consistency of the 

MLPB for kernel spectral density matrix estimators; this result is novel even 
in the univariate case. Let In(w) = {^) periodogram matrix, 

where 

1 "" 

(-i-i) iH = ^5=ELe-““ 

v 2 vrn 

is the discrete Fourier transform (DFT) of Y_i,... ,T„, = X_i — A. We 

define the estimator 

1 

(4.2) f(a;) = - ^ Kb{u - ujk)ln{^k) 

fc=-L(n-l)/ 2 j 

for the spectral density matrix f(u;), where [xj is the integer part of x G M, 
ujk = 2TT^,k = — ,..., [f J are the Fourier frequencies, b is the band¬ 

width and K is a symmetric and square integrable kernel function A(-) that 
satisfies f K(x)dx = 27r and f K(u)u^du < 00 and we set Kb(-) = \K{^). 
Let Ini^) tie the bootstrap analogue of In(w) based on To • ■ • iY.n generated 
from the MLPB scheme and let f*(ci;) be the bootstrap analogue of f(w). 

Theorem 4.2. Suppose assumptions (Al) with g >0 specified below, 
(A 2 ), (A3), (A4) for q = 8 and (A 6 ) are satisfied. // 6 —>• 0 and nb —>• 
00 such that nb^ = 0 ( 1 ) as well as 1/1 \/Wlog^(n) + Vnblog^{n)/U and 
l/k-\-bk‘^-\-VT^^og‘^{n)/k^ for some sequence k = k{n), the MLPB is asymp¬ 
totically valid for kernel spectral density estimates f(t<;). That is, for a/Z s G N 
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and arbitrary frequencies 0 < wi,..., < tt (not necessarily Fourier frequen¬ 

cies), it holds 

sup \P{{Vnb{fpq{ujj) - fpqiujj)):p, q = 1,..., d] j = 1,..., s) < x} 

- P*{{VnI>ifpq{ooj) - fpqiooj)):p, q = 1,..., d] j = 1,..., s) < x}\ 

= op{l), 

where fpq{oj) = ^Yl'lZ}_^^_i~^Ki{h)Cpq{h)e~^^^ and, in particular, 
nbCov*{f;q{u;)j;,iX)) 

~ifpri^)fqs{^)^u),Xp fpsi^)fqri^)TO,TT)'^^ J P {u)dupOp{l), 

and E*{f*g{uj))-fpq{u)) = b'^f”g{uj)^jK(u)u^ du+ppib"^), for allp,q,r,s = 
1,... ,d and all w, A £ [0,vr], respectively. 


4.3. Other statistics and LPB-of-blocks bootstrap. For statistics Tn con¬ 
tained in the broad class of functions of generalized means, Jentsch and 
Politis (2013) discussed how by using a preliminary blocking scheme tailor- 
made for a specific statistic of interest, the MLPB can be shown to be 
consistent. This class of statistics contains estimates T„ of w{'d) with d = 
E{g{2Lt,--- ,2£.t+m-i)) such that 

n—m+l 'I 

n-m+1 ^ = 

for some sufficiently smooth functions g: —>■ re: ^ M and fixed 

m G N. They propose to block the data first according to the known function 
g and to apply then the (M)LPB to the blocked data. More precisely, the 
multivariate LPB-of-blocks bootstrap is as follows: 

Step 1. Define := g{X^,.. .,Xt+m-i)^ and let be the 

set of blocked data. 

Step 2. Apply the MLPB scheme of Section 3 to the /c-dimensional blocked 

5(! ~ 5(! 

data A^,... ,X.n-m+i to get bootstrap observations 2 Li, ■ ■ ■ i2Ln-m+i- 

Step 3. Compute T* = w{{n — m + 1)“^ y^nj-m-n 

Step 4. Repeat steps 2 and 3 R-times, where B is large, and approximate 
the unknown distribution of ^/n{Tn — r(;('d)} by the empirical distribution 
of -Tn},..., - Tn}. 

The validity of the multivariate LPB-of-blocks bootstrap for some statistic 
Tn can be verified by checkii^ the assumptions of Theorem 4.1 for the sample 
mean of the new process {X_t,t G Z}. 
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5. Asymptotic results for increasing time series dimension. In this sec¬ 
tion, we consider the case when the time series dimension d is allowed to 
increase with the sample size n, that is, d = d{n) —>■ oo as n —>• oo. In partic¬ 
ular, we show consistency of tapered covariance matrix estimates and derive 
rates that allow for an asymptotic validity result of the MLPB for the sample 
mean in this case. 

The recent paper by Cai, Ren and Zhou (2013) gives a thorough discussion 
of the estimation of Toeplitz covariance matrices for univariate time series. 
In their setup, that covers also the possibility of having multiple datasets 
from the same data generating process, Cai, Ren and Zhou (2013) estab¬ 
lish the optimal rates of convergence using the two simple flat-top kernels 
discussed in Section 2.2, namely the truncated (i.e., case of pure banding— 
no tapering) and the trapezoid taper. When the strength of dependence is 
quantified via a smoothness condition on the spectral density, they show 
that the trapezoid is superior to the truncated taper, thus confirming the 
intuitive recommendations of Politis (2011). The asymptotic theory of Cai, 
Ren and Zhou (2013) allows for increasing number of time series and in¬ 
creasing sample size, but their framework does not contain the multivariate 
time series case, neither for fixed nor for increasing time series dimension, 
which will be discussed in this section. 

Note that Theorem 1 in McMurry and Politis (2010) for the univariate 
case, as well as our Theorem 2.1 for the multivariate case of fixed time series 
dimension, give upper bounds that are quite sharp, coming within a log-term 
to the (Gaussian) optimal rate found in Theorem 2 of Cai, Ren and Zhou 
(2013). 

Instead of assumptions (A1)-(A5) that have been introduced in Sec¬ 
tion 2.1 and used in Theorem 4.1 to obtain bootstrap consistency for the 
sample mean for fixed dimension d, we impose the following conditions on 
the sequence of time series process S Z})„gN of now increasing di¬ 

mension. 

5.1. Assumptions. 

(AT) ({ At = ■ ■ ■, t S Z})„gi^ is a sequence of M'^("’Tvalued 

strictly stationary time series processes with mean vectors E(X^) = p, = 
{pi,..., and autocovariances C(/i) = (C'jj(/i))jj=i^,,,^rf(„) dehned as in 

(2.1). Here, {d{n))n£n is a nondecreasing sequence of positive integers such 
that d{n) —>■ oo as re —>■ oo and, further, suppose 

CX) 

|sup sup \h\^\Cij{h)\\ < oo 
for some g > 0 to be further specified. 
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(A2') There exists a constant M' < oo such that for all n G N and all h 
with \h\ < n, we have 


sup 


- Xj) - nCij{h) 

t=i 


2 


(A3') There exists an no G N large enough such that for all n > no and all 
d'>do = d(no) the eigenvalues Ai,..., \dn of the {dn x dn) covariance matrix 
Tdn are bounded uniformly away from zero and from above. 


(AT) Define the sequence of projection operators (X) = i?(X|X^”^) — 
for = a (X^ , t < k), and suppose 


and 


El 


m=0 


sup sup 

nEN i=l,...,d{n) 


y 

-^0 -^i^m 



< OO 


sup sup ||Xj —/ij ||4 = 0(n ^/^). 

nEN i=l^...^d{n) 

(A5') For the sample mean, a Cramer-Wold-type CLT holds true. That is, 
for any real-valued sequence h = b{d{n)) of (i(n)-dimensional vectors with 0 < 
< \b{d{n ))\2 < M 2 < 00 for all n G N and = Var(-y/n(6^(X — 

we have 


— fd))/v ^^A^(0,1). 

Assumptions (Al')~(A4') are uniform analogues of (Al)-(A4), which are 
required here to tackle the increasing time series dimension d. In particular, 
(AT) implies 

CX) 

(5.1) \C{h)\, = 0{d^). 

h=—oo 


Observe also that the autocovariances Cij(h) are assumed to decay with 
increasing lag h, that is, in time direction, but they are not assumed to 
decay with increasing \i — j\, that is, with respect to increasing time series 
dimension. Therefore, we have to make use of square summable sequences 
in (A5') to get a CLT result. This technique has been used, for example, 
by Lewis and Reinsel (1985) and Gonqalves and Kilian (2007) to establish 
central limit results for the estimation of an increasing number of autore¬ 
gressive coefficients. A simple sufficient condition for (A5') is, for example, 
the case of ({X^ = (Xi ^^,..., Xd(^n),t)'^P £ ^})neN being a sequence of i.i.d. 
Gaussian processes with eigenvalues of EiX^Xf) bounded uniformly from 
above and away from zero. 
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5.2. Operator norm convergence for increasing time series dimension. 
The following theorem generalizes the results of Theorems 2.1 and 2.2 and of 
Corollary 2.1 to the case where d = d{n) is allowed to increase with the sam¬ 
ple size. In contrast to the case of a stationary spatial process on the plane 
J? (where a data matrix is observed that grows in both directions asymptot¬ 
ically as in our setting), we do not assume that the autocovariance matrix 
decays in all directions. Therefore, to be able to establish a meaningful the¬ 
ory, we have to replace (Al)-(A5) by the uniform analogues (Al')-(A5'), 
and due to (5.1), an additional factor df turns up in the convergence rate 
and has to be taken into account. 

Theorem 5.1. Under assumptions (Al') with g >0 specified below, 
(A2') and (A3'), we have: 

(i) /o(f - Tdn) and p((f^ ;)“^ - T^^^) are terms of order Op{d‘^ri^n), 
where 

I 

(5.2) ri^n = —j=+ ]sup sup \Cij{h)\\, 

and dfri^n = o(l) */ 1/^ + d^l/y/n + d?/U = o(l). 

(ii) - rj/^) and p((f^_;)-V2 _ are both terms of or¬ 

der Op{log'^{dn)d‘^ri^n) and log^{dn)d‘^ri^n = o(l) if ^/l + log^{dn)d‘^l/y/n-\- 
log^{dn)d‘^ /U = o(l). 

(hi) p{Tdn), pC^dn)’ pi^dn ) P(^d,n ) bounded from above and 

below. piT^ i) and p{{Tl i)-^) as well as p((fI,z)"^/^) and p((f are 
bounded from above and below in probability ifdfri^n = o(l) and log^{dn)d‘^ x 
n,n = o(l), respectively. 

The required rates for the banding parameter I and the time series di¬ 
mension d to get operator norm consistency /o(r,, ^ — T^^) = op(l) can be 
interpreted nicely. If g is chosen to be large enough, d'^l/^/n becomes the 
leading term, and there is a trade-off between capturing more dependence 
of the time series in time direction (large 1 ) and growing dimension of the 
time series in cross-sectional direction (large d). 

5.3. Bootstrap validity for increasing time series dimension. The sub¬ 
sequent theorem is a Cramer-Wold-type generalization of Theorem 4.1 to 
the case where d = d{n) is allowed to grow at an appropriate rate with the 
sample size. To tackle the increasing time series dimension and to prove 
such a CLT result, we have to make use of appropriate sequences of square 
summable vectors b = b{d{n)) as described in (A5') above. 
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Theorem 5.2. Under assumptions (Al') with g > 0 specified below, 
(A2'), (A3'), (A4') for q = ^, (A5') as well as 1/1 + log‘^{dn)d'^l/y/n + 
\o^{dn)d?/U = o(l) and l/k + dfk^/n + \og^{dn)d^/k^ = o(l) for some se¬ 
quence k = k{n), the MLPB is asymptotically valid for the sample mean 
X . That is, for any real-valued sequence b = b{d{n)) of d{n)-dimensional 
vectors with 0 < Mi < \b{d{n ))\2 < M 2 < 00 for all n and - ^d{n) - 

Var*(y^(6^y*)), we have 

sup|P{v^(^^(^ — g))/v <x} — P*{^/n{lFY^)/v < x}| = op(l) 
and \v‘^ — v^\ = op(l). 

5.4. Reduction of computation time. In practice, the computational re¬ 
quirements can become very demanding for large d and n. In this case, we 
suggest to split the data vector X_ in few subsamples ... ,XS^\ say, and 
to apply the MLPB scheme to each subsample separately. This operation 
can be justified by the fact that dependence structure is distorted only few 
times. Precisely, we suggest the following procedure: 

Step 1. For small S' G N, define rigub = \^/S'] and Aigub = <^^sub such that 
SiVsub > N, and let ..., i = 1,..., S, where 

is filled up with zeros if SN^uh > N. 

Step 2. Apply the MLPB bootstrap scheme as described in Section 3 
separately to the subsamples A^^^ ..., to get . 

Step 3. Put ..., end-to-end together, and discard the last 

SAgub — A values to get Yfi and Y*. 

Here, computationally demanding operations as eigenvalue decomposi¬ 
tion, Cholesky decomposition and matrix inversion have to be executed only 
for lower-dimensional matrices, such that the algorithm above is capable 
to reduce the computation time considerably. Further, to regain efficiency, 
we propose to use the pooled sample mean A for centering and T^ 
for whitening and re-introducing correlation structure for all subsamples in 
step 2. Here, is obtained analogously to (2.9), but based on the 

upper-left (Asub x Agub) sub-matrix of T^,/. 

6. Simulations. In this section we compare systematically the perfor¬ 
mance of the multivariate linear process bootstrap (MLPB) to that of the 
vector-autoregressive sieve bootstrap (AR-sieve), the moving block boot¬ 
strap (MBB) and the tapered block bootstrap (TBB) by means of simula¬ 
tion. In order to make such a comparison, we have chosen a statistic for which 
all methods lead to asymptotically correct approximations. Being interested 
in the distribution of the sample mean, we compare the aforementioned 
bootstrap methods by plotting: 
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(a) root mean squared errors (RMSE) for estimating the variances of 
\/n(X — fi) and 

(b) coverage rates (CR) of 95% bootstrap confidence intervals for the 
components of /i 

for two data generating processes (DGPs) and three sample sizes in two 
different setups. First, in Section 6.1, we compare the performance of all 
aforementioned bootstraps with respect to (w.r.t.) tuning parameter choice. 
These are the banding parameter I (MLPB), the autoregressive order p (AR- 
sieve) and the block length s (MBB, TBB). Furthermore, we report RMSF 
and CR for data-adaptively chosen tuning parameters to investigate how 
accurate automatic selection procedures can work in practice. Second, in 
Section 6.2, we investigate the effect of the time series dimension d on the 
performance of the different bootstrap approaches. 

For each case, we have generated T = 500 time series and B = 500 boot¬ 
strap replications have been used in each step. For (a), the exact covariance 
matrix of \/niX — p) is estimated by 20,000 Monte Carlo replications. Fur¬ 
ther, we use the trapezoidal kernel defined in (2.6) to taper the sample 
covariance matrix for the MLPB and the blocks for the TBB. To correct the 
covariance matrix estimator to be positive definite, if necessary, we set 
e = 1 and /3 = 1 to get F^ ^. This choice has already been used by McMurry 
and Politis (2010) and simulation results (not reported in this paper) indi¬ 
cate that the performance of the MLPB reacts only slightly to this choice. 
We have used the sub-vector resampling scheme, that is, steps 3' and 4' 
described in Section 3. 

Some additional simulation results and a real data application of the 
MLPB to the weighted mean of an increasing number of German stock prices 
taken from the DAX index can be found in the supplementary material to 
this paper [Jentsch and Politis (2015)]. The R code is available at http:// 
WWW .math. ucsd. edu/~politis/S0FT/f unct ion_]yiLPB. R. 

6.1. Bootstrap performance: The effect of tuning parameter choice. We 
consider realizations A^,.... X„ of length n = 100,200,500 from two bivari¬ 
ate (d = 2) DGPs. Precisely, we study a first-order vector moving average 
process 


VMA(l) model = Aej_^ -1- 

and a hrst-order vector autoregressive process 

VAR(l) model A ^ = AA ^ ^ + Cf. 

where ~ A/(0, S) is a normally distributed i.i.d. white noise process and 



and 



20 


C. JENTSCH AND D. N. POLITIS 


RMSE, n=100 


RMSE, n:::200 


RMSE, n=500 





CR, n=100 



CR, n=200 CR, n=500 




Fig. 1. RMSE for estimating'Va.r{^/n{Xi — fii)) and CR of bootstrap confidence intervals 
for fj,i by MLPB (solid), AR-sieve (dashed), MBB (dotted) and TBB (dash-dotted) are 
reported vs. the respective tuning parameters l,p, s € {1,..., 20} for the VMA(1) model with 
sample size n £ {100,200,500}. Line segments indicate results for data-adaptively chosen 
tuning parameters. MLPB with individual (grey) and global (black) banding parameter 
choice are reported. 

have been used in all cases. It is worth noting that (asymptotically) all 
bootstrap procedures under consideration yield valid approximations for 
both models above. For the VMA(l) model, MLPB is valid for all (suffi¬ 
ciently small) choices of banding parameters / > 1, but AR-sieve is valid 
only asymptotically for p = p{n) tending to infinity at an appropriate rate 
with increasing sample size n. This relationship of MLPB and AR-sieve is 
reversed for the VAR(l) model. For the MBB and the TBB, the block length 
has to increase with the sample size for both DGPs. 

In addition to the results for tuning parameters l,p,s € {!,...,20}, we 
show also RMSE and CR for tuning parameters chosen by automatic se¬ 
lection procedures in Figures 1 and 2. For the MLPB, we report results 
for data-adaptively chosen global and individual banding parameters as dis¬ 
cussed in Section 2.3. For the AR-sieve, the order of the VAR model fitted 
to the data has been chosen by using the R routine VAR(-) contained in 
the package vars with lag.max = sqrt{n/log{n)). The block length is chosen 
by using the R routine b.star{-) contained in the package np. In Figures 1 
and 2, we report only the results corresponding to the first component of the 
sample mean, as those for the second component lead qualitatively to the 
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Eig. 2. As in Figure 1, but with VAR(l) model. 


same results. We show them in the supplementary material, which contains 
also corresponding simulation results for a normal white noise DGP. 

For data generated by the VMA(l) model, Figure 1 shows that the MLPB 
outperforms AR-sieve, MBB and TBB for adequate tuning parameter choice, 
that is, / ~ 1. In this case, the MLPB generally behaves superiorly, with re¬ 
spect to RMSE and CR, to the other bootstrap methods for all tuning 
parameter choices of p and s. This was not unexpected since, by design, the 
MLPB can approximate very efficiently the covariance structure of moving 
average processes. Nevertheless, due to the fact that all proposed bootstrap 
schemes are valid at least asymptotically, AR-sieve gets rid of its bias with 
increasing order p, but at the expense of increasing variability and conse¬ 
quently also increasing RMSE. MLPB with data-adaptively chosen banding 
parameter performs quite well, where the individual choice tends to per¬ 
form superiorly to the global choice in most cases. In comparison, MBB and 
TBB seem to perform quite well for adequate block length, but they lose 
in terms of RMSE as well as CR performance if the block length is chosen 
automatically. 

The data from the VAR(I) model is highly persistent due to the coeffi¬ 
cient All = 0.9 near to unity. This leads to autocovariances that are rather 
slowly decreasing with increasing lag and, consequently, to large variances 
of \/n(X — u). Eigure 2 shows that AR-sieve outperforms MLPB, MBB and 
TBB with respect to CR for small AR orders 1. This is to be expected 
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since the underlying VAR(l) model is captured well by AR-sieve even with 
finite sample size. But the picture appears to be different with respect to 
RMSE. Here, MLPB may perform superiorly for adequate tuning parame¬ 
ter choice, but this effect can be explained by the very small variance that 
compensates its large bias, in comparison to the AR-sieve (bias and vari¬ 
ance not reported here) leading to a smaller RMSE. This phenomenon is 
also illustrated by the poor performance of MLPB with respect to CR for 
small choices of 1. However, more surprising is the rather good performance 
of the MLPB if the banding parameter is chosen data-adaptively, where the 
MLPB appears to be comparable to the AR-sieve in terms of RMSE and is at 
least close with respect to CR. Eurther, as observed already for the VMA(l) 
model in Eigure 1, the individual banding parameter choice generally tends 
to outperform the global choice here again. Similarly, it can be seen here 
that the performance of AR-sieve worsens with increasing p at the expense 
of increasing variability. The block bootstraps MBB and TBB appear to be 
clearly inferior to MLPB and AR-sieve, particularly with respect to CR, but 
also with respect to RMSE if tuning parameters are chosen automatically. 

6.2. Bootstrap performance: The effect of larger time series dimension. 
We consider d-dimensional realizations with n= 100,200,500 

from two DGPs of several dimensions. Precisely, we study first-order vector 
moving average processes 

VMA(i(l) model X_^ = + e^ 

and first-order vector autoregressive processes 

VARd(l) model = AX ^ j -I- 

of dimension d e {2,..., 10}, where ~ AA(0, is a d-dimensional nor¬ 
mally distributed i.i.d. white noise process, and S = (Ejj) and A = {Aij) are 
such that 


i = j, 

\i — j\ = 1, 

and A,o' = < 

' 0.9, 

0.5, 

i=j 

i=j 

(z-hl)/2GN, 
z/2 e N, 

otherwise 

‘'J 

-0.4, 

i +1 

= j, 


lo. 

otherwise. 


Observe that the VMA(l) and VAR(l) models considered in Section 6.1 are 
included in this setup for d = 2. 

In Figures 3 and 4, we compare the performance of MLPB, AR-sieve, 
MBB and TBB for the DGPs above using RMSE and GR averaged over 
all d time series coordinates. Precisely, we compute RMSE individually for 
the estimates of Var(-y/n(Aj — pf)), i = 1,..., d and plot the averages in the 
upper half of Figures 3 and 4. Similarly, we plot averages of individually 
calculated GR of bootstrap confidence intervals for pi, z = l,...,d in the 
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RMSE, n=100 


RMSE, n;:200 


RMSE, n=S00 





CR, n=100 


CR, n=200 


CR, n=500 





Fig. 3. Average RMSE for estimating Vax[y/n[Xi — ^i)), i = l,...,d and average CR 
of bootstrap confidence intervals for pLi, i = 1,... ,d, by MLPB (solid), AR-sieve (dashed), 
MBB (dotted) and TBB (dash-dotted) with data-based optimal tuning parameter choices 
are reported vs. the dimension d£ {2,..., 10} for the VMAd(l) model with sample size 
n£ {100,200,500}. MLPB with individual (grey) and global (black) banding parameter 
choice are reported. 


lower halfs. All tuning parameters are chosen in a data-based and optimal 
way, as described in Section 6.1, and to reduce computation time, the less 
demanding algorithm, as described in Section 5.4 with S = (in/500, is used. 

For the VMA(l) DGPs in Figure 3, the MLPB with individual banding 
parameter choice outperforms the other approaches essentially for all time 
series dimension under consideration with respect to averaged RMSE and 
CR. In particular, larger time series dimensions do not seem to have a large 
effect on the performance of all bootstraps for the VMA(l) DGPs, with the 
only exception being the MLPB with global banding parameter choice. In 
particular, the latter is clearly inferior in comparison to the MLPB with in¬ 
dividually chosen banding parameter, which might be explained by sparsity 
of the covariance matrix T6dn- 

In Figure 4, for the VAR(l) DGPs, the picture is different from the 
VMA(l) case above. The influence of larger time series dimension on RMSE 
(and less pronounced for CR) performance is much more pronounced and 
clearly visible. In particular, the RMSE blows up with increasing dimension 
d for all four bootstrap methods, which is due to the also increasing variance 
of the process. Note that the zig-zag shape of the RMSE curves is due to 
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RMSE, n=100 RMSE, n=200 RMSE, n=500 



Fig. 4. As in Figure 3, but with VARd(l) model. 


the back and forth switching from 0.9 to 0.5 on the diagonal of A. As al¬ 
ready observed for the VMA(l) DGPs, the MLPB with individual banding 
parameter choice again performs best over essentially all time series dimen¬ 
sions with respect to average RMSE and average CR. In particular, MLPB 
with individual choice is superior to the global choice. Here, the good per¬ 
formance of the MLPB is somewhat surprising as the VAR(l) DGPs have 
rather slowly decreasing autocovariance structure, where we expected an 
AR-sieve to be more suitable. 
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SUPPLEMENTARY MATERIAL 

Additional proofs, simulations and a real data example 

(DOI: 10.1214/14-AOS1301SUPP; .pdf). In the supplementary material we 
provide proofs, additional supporting simulations and an application of the 
MLPB to German stock index data. The supplementary material to this 
paper is also available online at http://www.math.ucsd.edu/-politis/ 
PAPER/MLPBsupplement.pdf . 
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